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BLOCK  REFLECTORS:  THEORY  AND  COMPUTATION' 
ROBERT  SCHREIBERt  AND  BERESFORD  PARLETTt 


Abstract.  A  block  reflector  is  an  orthogonal,  symmetric  matrix  that  reverses  a 
subspace  whose  dimension  may  be  greater  than  one.  We  shall  develop  the  properties 
of  block  reflectors  and  give  some  algorithms  for  computing  a  block  reflector  that 
introduces  a  block  of  zeros  into  a  matrix.  We  consider  the  compact,  representation 
of  block  reflectors,  some  applications,  and  their  use  in  parallel  computers. 

1.  Introduction.  Block  reflectors  are  orthogonal,  symmetric  matrices  with 
possibly  more  than  ono  negative  eigenvalue.  They  are  a  natural  generalization  of  the 
elementary  reflectors  (also  known  as  Hotischolder  transformations)  that  are  widely 
used  in  matrix  computation.  Block  reflectors  have  similar  uses. 

We  shall  develop  a  theory  of  block  reflectors  and  their  computation.  We  also 
discuss  some  applications  of  block  refle<  tors,  give  some  numerical  results  showing 
the  stability  of  our  algorithm,  and  show  how  this  algorithm  is  well  matched  to  the 
capabilities  of  some  new,  fast  scientific  computers. 

Block  reflectors  are  not  new.  Bronlund  and  Johnsen  gave  a  method  for  orthog¬ 
onal  reduction  to  block  upper  triangular  form,  but  the  orthogonal  transformations 
were  nonsymmetric  [2].  Dietrich  derived  the  block  reflector  as  we  discuss  it  here, 
gave  a  stable  method  for  computing  it,  and  showed  how  it  may  be  used  for  reduc¬ 
tion  to  block  upper  triangular  form  [4|.  Kaufman  has  considered  the  use  of  block 
reflectors  for  block  triangularization  of  a  sparse  matrix  [8]. 

In  this  paper,  we  make  the  following  contributions.  First,  we  derive  a  complete 
theory  of  block  reflectors,  clearly  showing  the  parallels  between  the  block  and  the 
point  theory.  Our  presentation  is  considerably  more  direct  than  that  found  in  [4]. 


t  Dtpaitment  of  Competor  Scieacc,  RenMtUer  Polytechnic  hutHote,  TVoy,  New  York  12180- 
3590.  This  research  was  partially  sapportsd  by  the  Sa;^  Computer  Corporation  and  the  Office  of 
Naval  Research  onder  Contract  N00014-8a-K-0610. 

^  Department  of  Mathematics,  University  of  California,  Berkeley,  California,  94720.  This 
research  was  partially  snpported  by  the  Office  of  Naval  Reeearch  ender  Contract  N00014-85-K-0180. 

*  SIAM  Jonmal  on  Numerical  Analysis,  to  appear.  A  preUminary  version  of  this  paper  ap¬ 
peared  in  R.  Gloeriaski  and  J.L.  Lions  eds.,  CompmUr  Metkode  m  Applitd  Seiencer  end  EnpmMrinp 
VII,  North-Holland,  Amsterdam,  1980. 


1 


In  addition  to  existence,  we  also  answer  the  question  of  the  uniqueness  of  the 
block  reflector,  giving  it  in  its  most  g(‘nerat  form.  Then  we  present  a  new  view  of 
the  theory;  our  vantage  point  is  the  operator  angle  between  two  subspaces.  This 
more  clearly  reveals  the  structure  of  ihc  block  reflector  used  to  map  between  two 
subspaces,  and  also  leads  to  new  algorithms. 

After  providing  two  applications,  we  show  how  to  construct  a  block  reflector 
that  introduces  a  zero  block  into  a  matrix.  Four  algorithms  are  presented.  One  is  a 
new  version  of  Dietrich’s  stable  method.  The  role  played  by  the  polar  decomposition 
of  a  matrix  in  this  method  is  revealed.  A  second,  related  algorithm  can  be  used 
to  construct  any  of  the  several  block  reflectors  that  map  between  a  given  pair  of 
subspaces;  Dietrich's  method  can  be  used  to  construct  only  one  of  these.  Two  new 
algorithms  based  on  the  operator  angle  are  also  given.  Finally,  the  efficiency  of 
these  methods  on  modern  parallel  computers  is  examined.  A  numerical  experiment 
illustrates  their  accuracy,  even  for  very  badly  conditioned  matrices. 

Bischof  and  Van  Loan  have  pursued  a  somewhat  different  approach  [1].  They 
develop  a  representation  for  the  product  of  several  elementary  reflectors  of  the 
form  I  —  Wy,  where  W  and  Y  are  rectangular  matrices.  With  this  representation, 
the  usual  orthogonal  reduction  to  triangular  form  can  be  organized  so  that  it  is 
dominated  by  matrix  multiplications,  an  important  virtue,  as  we  explain  below. 

For  any  matrix  X,  R{X)  denotes  the  range  of  X.  For  any  subspace  Y,  Y^ 
denotes  the  orthogonal  complement  of  Y. 

2.  Block  reflectors:  Theory. 

2.1.  Deflnition.  Given  any  Z  €  R"***,  m  >  n,  “the  reflector  that  reverses 
the  range  of  Z"  b  given  by 

H  =  H{Z)  :=  /„  -  ZWZ‘ 

where 

W  =  2{Z*Z)'^  €  R"’'* 

is  the  (symmetric)  pseudo-inverse  of  ^(Z^Z).  (See  [6,  p.  139]  for  a  fuller  description 
of  the  pseudo-inverse.)  Thus 

ZWZ*  =  2Pz 

where  Pz  b  the  orthogonal  projector  on  R{Z). 

When  n  =  1,  H  is  an  elementary  reflector  or  Householder  transformation. 
When  Z*Z  b  mvertibie,  W  =  (J’Z*Z)-*.  Note  that  if  Z  =  0,  then  W  =  0  and 
R(0)  s  /.  Also,  H{Z)Z  =  -Z.  (See  I^mma  1  for  proof.) 

Thb  choice  of  W  makes  H  orthogonal  as  well  as  symmetric.  Hence,  —  7, 
the  reflector  property. 

It  b  also  easy  to  verify  that  if  iZ(Z)  =  R(Zi)i  then  H{Z)  =  H{Zi).  Thus,  for 
example,  HiZT)  ^  H{Z)  for  any  invertible  T,  and  H(ZW)  »  lf(Z)  as  well. 
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Since  H  is  orthogonal  and  symmetric,  its  eigenvalues  are  1  and  -1.  The  mul¬ 
tiplicity  of  —1  is  equal  to  the  dimension  of  the  space  reversed  by  H;  this  in  turn  b 
equal  to  the  rank  of  Z. 

2.2.  Essential  properties.  Every  m  x  m  orthogonal  reflector  H  induces 
a  decomposition  of  R”*  into  the  direct  sum  of  two  perpendicular  subspaces;  one 
subspace  b  B  invariant  and  the  other  b  reversed  by  H.  The  following  lemma  states 
that  for  the  reflector  H{Z),  the  reversed  subspace  is  R(Z). 

Lemma  l.  Ut  Z  e  For  all  z€.  R{Z), 

(la)  H(Z)z=-z 
and  for  all  y  €.  R[Z)-^ , 

(lb)  HiZ)y  =  y. 


Proof.  We  require  the  fact  that,  for  any  matrix  B, 

=  B  , 

which  is  easy  to  prove  tising  the  singular  value  decomposition  (SVD)  of  B.  (See  [6, 
pp.  16-20j  for  more  information  on  the  SVD.)  Now  let  z  €  R(Z).  Then  there  exists 
X,  such  that  z  =  Zz.  Using  the  fact  above  and  the  definition  of  B(Z), 

ff(Z)z  =  Zx  ~  ZWZ^Zx 
—  Zx  ~  2Zx 
=  -z 

so  that  (la)  is  proved.  Property  (lb)  comes  from  applying  H{Z) 
that  Z^y  =  0. 

2.3.  The  standard  task.  Let  E  =  €  R”***,  with 

square.  We  seek  a  block  reflector  H  such  that 

«£  =  ?:=  (J) 

with  F  square.  We  now  give  conditions  on  F  that  are  necessary  and  sufficient  for 
the  exbtence  of  H: 

(2a)  ISOMETRY  PROPERTY:  F*F  =  E*H*HE  =  E^E 

and 

(2b)  SYMMETRY  PROPERTY:  F^F  =  =  symmetric. 


to  y  and  noting 
QED 


m  >  n,  and  E\ 
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Define  the  associated  matrices: 


(3a) 


F  -  El 

-Ei  . 


(3b)  S  :=  F  +  £. 

[D  is  for  “difference,”  S  is  for  “sum.”) 

An  important  consequence  of  the  conditions  (2)  is  this: 
Lemma  2.  If  F  satisfies  (2),  then 

D*S  =  0. 


Proof.  By  (2), 


D^S  =  (fT  -  E^E)  +  (f‘£  -  E*F) 

=  (F‘F  -  £'F)  ^  (F‘£i  -  E[F) 

=  0  +  0 .  QED 

Theorem  l.  There  exists  a  block  reflector  H  such  that  HE  ^  F  if  and  only  if 
F  satisfies  (2). 

Proof.  Necessity  is  clear:  equation  (2a)  follows  from  the  orthogonality  of  H, 
while  (2b)  is  obvious  from  the  symmetry  of  H. 

To  show  sufficiency  we  shall  prove  that  H{D)E  =  7.  Since  E  =  |(S  -  £),  we 

have 

H{D)E  =  1  |ir(D)S  -  H(D)D\  , 

=  ^  [5  +  £]  (using  Lemmas  1  and  2) 

=  7.  QED 

Clearly  if  F  satisfies  (2)  then  -F  does  too.  Furthermore,  H{S)E  =  -T.  If  F 
and  E  satisfy  (2)  we  shall  say  that  F  is  a  mirror  image  of  E. 

The  concept  of  a  block  reflector,  the  necessary  and  sufficient  conditions,  and 
a  solution  to  the  standard  task  for  the  case  n  =  2  were  given  by  Tang  Ling  in  an 
unpublished  manuscript  |10]. 

2.3.1.  Representing  H.  It  may  not  be  advisable  to  represent  by  £,  F, 
and  W  as  above.  This  form  is  very  attractive  if  E  must  be  preserved  or  when  E  is 
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sparse.  On  the  other  hand,  when  E  has  rank  r  <  n.  then  it  saves  storage  to  find 
an  m  X  r  matrix  G  such  that  H  =  I  —  and  G*G  -  2/r.  (We  later  show  that 
this  is  possible).  Storage  of  G  requires  mr  words,  as  opposed  to  mn  +  n*  for  Z  and 
W,  mn  +  2n*  for  E,F,  and  W,  and  2mn  for  the  WY  representation  of  Bischof  and 
Van  Loan.  Computing  Hx  for  a  vector  z  costs  2mr  flops  (1  flop  is  one  multiply  and 
one  add)  compared  with  2mn  +  for  the  Z^W  representation.  An  algorithm  for 
computing  G  is  given  in  §3.2. 

2.4.  The  form  of  F.  F  is  far  from  unique,  even  when  E  has  full  rank 
n.  If  fTi  =0  then  the  symmetry  condition  is  vacuous  and  we  may  choose  any  F 
satisfying  the  isometry  condition  F*F  =  E*E.  At  the  other  extreme,  when  E\  is 
invertible,  then  F  must  have  the  form  F  -  ME\  with  M  symmetric.  Now  the 
isometry’  condition  requires 

a/2  -  e^^{e^e)e;^  . 

There  are  2*  solutions,  namely  M  -  V  diag(±v^»±V^, where 

Ef‘(£:‘£)£f‘  =  V  diag(A,,---,An)V* 

is  the  spectral  factorization.  Note  that  there  are  2”  different  solutions  for 
every  spectral  factorization  of  E~^{E*E)E~*.  And  with  repeated  eigenvalues  there 
are  infinitely  many  such  factorizations. 

The  two  “extreme”  solutions  are  F  =  ±  Fi,  where  A*/*  b 

the  positive  definite  square  root  of  A. 

Example.  Let 


where  V*  =  V"*;  i.e.,  V  a  orthogonal.  Then  Ff^(F*F)F|“*  =  VV*  =  /.  Thus 
we  may  take  Af  to  be  any  symmetric  orthogonal  matrix  (Af^  =  /);  in  particular, 
A/  =  7  will  do.  Then  F  =  AfV,  and 

?)Q=(0- 

2.4.1.  The  general  case.  Let  Ei  ^  PEQ*  with  P*P  =  Q^Q  =  7^^,  S 
positive  definite  and  diagonal,  r\  <  n.  Thb  b  the  "short”  SVD  of  Fi,  where 
ri  =  rank(Fi).  The  symmetry  condition  (2b)  requires  that 

QZP*F  =  symmetric  . 

The  general  solution  for  F  b 

*?„)(§ 


where  P,  Q  ^  make  {P,P)  and  (Q,Q)  orthogonal.  M\i  must  be  sym¬ 

metric,  but  Mil  and  M22  free. 

The  isometry  condition  (2a)  yields 

For  each  choice  of  P  and  Q  we  can  find  A/n,  M21,  and  M22  such  that  this  equation 
holds.  The  solutions  are  not  unique.  We  solve 

(4a)  M^^Mii  ^  Q^E^EQ, 

(4b)  Mi^Mii  =  g'fT'EQE"’ , 

(4c)  Ml  =  E-‘(EQ)‘(E(?)E-'  -  MiiMii . 

Even  when  M22  is  singular,  (4b)  is  consistent.  Nevertheless,  the  system  (4b)  may 

be  ill  conditioned,  so  we  do  not  propose  to  use  (4)  in  computations. 

2.5.  Is  H  unique? 

Suppose  F  satisfies  the  isometry  and  symmetry  conditions  (2).  We  have  shown 
that  the  choice  Z  ^  D  provides  a  block  reflector  H  such  that  HE  —  F.  h  this  the 
only  such  H{Z)7  The  answer  depends  on  the  rank  of  D  and  the  rank  of  S  in  (3) 
above. 

THEOREM  2.  Ut  F  satisfy  (2).  Let  H  -  H{Z).  The  conditions 
(5a)  P(D)  C  R{Z) 

and 

(5b)  R{Z)±RiS),  i.e.,  Z*S  =  0 

are  necessary  and  sufficient  for  H{Z)E  to  be  equal  to  7. 

Proof.  Recall  that  E=|(5-E).  IfZ  satisfies  (5)  then,  as  in  the  proof  of 
Theorem  1, 

H[Z)E  =  i(jy5  -  HD) 

=  |(s  +  ») 

=  7 


by  Lemmas  1  and  2. 

On  the  other  hand,  if  HE  =  7,  then 

T^E-ZWZ*E, 
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so  that 


D  =  -ZWZ^E 
=  Z{  wz% 

which  implies  (5a).  Now  that  (5a)  is  established,  we  may  use  it  to  prove  that  (5b) 
holds.  A  consequence  of  (5a)  is  that  H{Z)D  —  ~D.  Rewriting  this  relation  yields 

ZWZ*D  ~  2D  . 

.Using  this  equation  we  obtain 

F=  HE 

=  E  -  ^ZWZ^(S  -  D) 

=  E  +  D-  -ZWZ^S 
2 

=  F-]-ZWZ^S. 

2 

So  ZWZ^S  =  0.  Since,  as  we  noted  earlier,  is  the  orthogonal  projector  on 

R(Z)  ,  the  columns  of  5  are  orthogonal  to  those  of  Z  .  This  implies  (5b).  QED 

Now  suppose  that  H{Z)E  =  F.  By  Theorem  1,  F  satisfies  (2)  and  by  Theorem 
2,  Z  satisfies  (5).  Let  us  define 

fiZ  *  r«nk(Z)  ,  pD  =  rank(l?)  ,  ps  ~  rank(5)  . 

Since  Z,D,  and  5  belong  to  B.'"’*"  , 

(6a)  pz  <  n;  PD  <  «;  PS  <  n. 

■  Further,  by  (5a) 

(6b)  PD'^PZ 

and  by  (5b) 

(6c)  PZ  PS- 

Since  F  satisfies  (2),  Lemma  2  applies,  so  that 
(6d)  PD  +  PS  <  w». 

(We  could  conclude  (6d)  from  (6b)  and  (6c),  but  it  is  true  independent  of  the 
existence  of  Z,  as  we  have  shown.) 

Now  we  can  say  when  H  is  unique. 
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Theorem  3.  Let  F  be  a  mirror  image  of  E.  Then 

1.  if  pd  —  then  H{D)  is  the  unique  block  reflector  satisfying  HE  =  F; 

2.  if  PD  +  PS  =  then  H{D)  is  the  unique  block  reflector  satisfying  HE  =  F; 

S.  if  PD  <  inin(n,  m  -  ps),  then  H  is  not  unique. 

Proof.  Suppose  H{Z)E  =  F.  If  po  -  n,  then  it  is  clear  from  (5a)  and  (6a) 
that  R[Z)  =  R{D),  so  H(Z)  =  H{D)  is  the  unique  block  reflector  that  reverses 
R(D).  Similarly,  if  pd  +  PS  =  then  by  (5a),  (5b),  (6b)  and  (6c),  we  must  have 
R{Z)  =  R{D),  and  again  H(Z)  =  H(D)  is  the  unique  block  reflector  that  reverses 
i2(jD).  Finally,  if  po  <  min(n,m  -  ps),  we  may  choose  any  matrix  Z  C  R’"’'" 
whose  range  contains  R[D)  and  is  orthogonal  to  R(5).  By  our  assumption,  there 
exist  such  Z  of  rank  pp^PD  +  1,. ..  ,min(n,m  -  ps).  QED 

Corollary.  If  E^  has  rank  n,  then  H  is  unique. 

Proof.  Since 


D  = 


it  follows  that  n  >  rank(D)  >  rank(£})  =  n.  QED 

In  the  case  of  Householder  transformations  (n  =  1),  the  condition  of  the  corol¬ 
lary  is  satisfied  unless  Ei  —  0,  in  which  case  the  standard  task  is  not  much  of 
a  task  at  all!  Thus,  like  the  symmetry  condition  (2b),  the  possibility  of  genuine 
nonuniqueness  of  the  reflector  H  only  appears  in  the  multidimensional  case. 

Examples.  Let 


The  three  matrices 


1  O' 

0  1  . 

lo  oj 


(o  l)  ’  “  (o 


all  satisfy  the  conditions  (2).  With  Fo,  pr  =  0  and  ps  ~  2  and  we  have  0  <  <  1. 

We  may  choose 


Ho  = 


1  0 
0  1 
0  0 


0 

0 

±1 


Lh  F\  we  have  pp  =  1  uid 


0 

-1 

0 


0 

0 

±1 


With  Ft  we  have  pd  -  2  and  p^  =  0  and  .2  <  P2  <  2.  We  must  choose 


-1 


0 

0 


0  0 

-1  0 

0  1 
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This  example  shows  that  nonuniqueness  of  F  is  possible  ano  that  uniqueness 
of  H  is  possible  even  with  singular  E^. 

For  a  contrasting  example,  let 


0  0  0 

1  0  r 

E  = 

0  0  0 

1  0  1 

,  F  = 

0  1  1 

0  0  0 

0  1  1 

0  0  0 

Then  pd  =  2,  ps  =  2,  ai.J  2  <  <  2.  Thus  H  =  H{D)  is  unique,  even  though  E 

is  rank-deficient. 


2.6.  The  angle  between  R{E)  and  ^([o])'  section  we  shall 

rcderive  many  of  our  results  using  the  operator  angle  between  R{E)  and 

For  a  complete  discussion  of  the  angle  between  subspaces  see  Davis  and  Kahan 
j3].  This  rederivation  gives  us  a  new  view  of  the  block  reflector  that  allows  some 
geometric  insight  not  available  otherwise.  It  also  leads  to  some  algorithms  that 
would  not  be  discovered  from  the  algebraic  perspective  of  the  earlier  sections. 


Let  r  =  rank(£).  Let  the  columns  of  P  €  ^  orthonormal  basis  for 

/2(E).  We  disctiss  the  problem  of  finding  a  block  reflector  H  that  performs  the 
standard  task  for  P  rather  than  E;  since  they  have  the  same  range,  this  H  also 
performs  the  standard  task  for  E:  If  E  =  PT  and  E  is  a  block  reflector  such  that 

HP  =  [  0  ^  square,  then  HE  =  Thus,  for  the  moment,  we  work  with  P 

rather  than  E. 


Let 


where  Pi  is  square.  Let 


Pi  =  QiMi 


be  a  polar  decomposition  of  Pi.  The  factor  Mi  is  the  symmetric,  nonnegative 
definite  square  root  of  PfPi,  and  is  unique.  The  other  factor,  Qi,  is  orthogonal 
and  is  unique  only  if  Pi  is  nonsingular.  Since  P  has  orthonormal  columns,  the 
eigenvalues  of  Mi  all  lie  in  [0,1]  (see  [6,  p.  22]).  Higham  [7]  discusses  the  polar 
decomposition  and  gives  an  efficient  algorithm  for  computing  it. 

It  is  simple  to  show  that  <?i  is  a  mirror  im^e  of  P.  In  fact,  a  version  of  the 
converse  is  also  true.  If  Q  is  any  mirror  image  of  P  then 


Pi^QM 

is  a  factorization  of  Pi  into  an  orthogonal-symmetric  product.  (The  choice  is  in  the 
signs  of  the  eigenvalues  of  Af,  as  it  was  in  §2.4).  For  the  moment  we  choose  to  work 
with  the  polar  factorization. 
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Define  Qi  =  ^  j . 


We  now  write 


M\  =  cos  2©  ; 


where  the  angle  6  is  defined  by 


e  =  Vdiag{ei,-.,d,)V‘, 


where 

0  <  tf  1  ^  ^  f  ^  ~  . 

4 

The  eigenvectors  of  M\  are  the  columns  of  V'  and  its  eigenvalues  are  1  >  cos  28\  > 
•  •  •  >  cos  >  0  . 

W'e  also  factor 


(7)  Pj  —  QiMz 

where  Afj  =  (/  -  =  sin  2©  is  symmetric,  nonnegative  definite  and  Q2  € 

^m-rxr  form  of  Q2  Will  be  clarified  below.  We  may  choose  Q2  so  that 

(8)  <?3Q3(sin  ©)  =  sin  ©  . 

It  is  easy  to  prove  this  by  using  the  C-S  decomposition  of  P  [13]. 

From  this  new  viewpoint  we  obtain  several  new  formulas.  First 


Pi  =  QiMi  =  Qi  co82©  ,  Pz  ~  Qz^z  —  Qz 2©  ; 

hence 

(9)  (/  +  =  VSeos©  ,  (/-Ml)*/*  =  v/2sin©  . 

Next, 

(10)  («■;/•) = 
and 

Thus, 

(12)  i5*5  =  2coe*© 

2 
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and 


=  285n*e  . 


(13) 

The  definition  of  the  angles  {0j}  insures  that  cos*  ©  is  both  nonsingular  and  well 
conditioned.  Define 

(14)  G^G\ 
where 

('*)  • 

By  (10)  and  (12),  since  (cos*©)"^  =  (cos*©)  *, 

H(S)  =  //,  . 

Note  the  analogy  with  the  case  n  =  1  where,  if  2$  is  the  angle  between  e  (that 
is,  E)  and  the  ei-axis,  then  H  —  I  2v+t^^^  where  —  (co8  0,sintf)^ 

We  now  consider  H(D).  It  is  important  to  be  able  to  construct  H{D)  since  in 
some  instances  it  is  what  we  want.  In  particular,  if  E  «  ^orth<^onal^  H[D) 
produces  a  small  change  to  E. 

.Now  note  that  0  is  not  neccfssarily  of  full  rank.  In  fact 

PO  ••  rank(i7) 

=  rankiD^D) 

=  rank(sin©) 

=  rank(8tn2©) 

=  rank(M2) 

=  rank(P3) 

<  min(r,m  -  r)  =  a  . 

If  any  of  the  a  singular  values  of  Pj  is  zero  then  po  <  a  and  the  angles  =  •  •  •  = 
=  0.  In  this  case  the  analog  to  (14),  namely 

(16)  H.  =  l-  C7-Gi 

where 

(IT)  = 

fails  in  the  sense  that  H{D)  ^  For  it  is  clear  that  ranh(C-)  =  a,  so  that  H- 
reverses  an  a-diimensional  subspace;  but  B{D)  reverses  only  R{D),  which  is  just 
P/Hlimensional.  Since,  fay  assumption,  po  <  the  two  block  reflectors  must  differ. 
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By  Theorem  3,  however,  H[D)  is  not  unique.  We  may  therefore  ask  whether 
H-  satisfies  H-P  =  Qi  despite  the  fact  that  it  is  not  H{D).  According  to  Theorem 
2,  it  does  if 

(18)  i2(£>)  C  R{G.) 
and 

(19)  C7‘_S  =  0. 

But  (18)  follows  from  the  characterization  (11)  of  D  and  the  definition  (17)  of  C7_. 
The  orthogonality  (19)  follows  from  property  (8)  of  Q^. 

When  D  is  rank  deficient  there  may  be  other  choices  as  well.  In  fact,  if  po  < 
s  -  1  then  there  are  block  reflectors  that  reflect  P  into  Qi  and  reverse  subspaces 
of  dimension  greater  than  p/j  and  less  than  s.  The  two  that  we  have  exhibited  are 
the  extreme  cases:  H{D)  reverses  the  smallest  possible  subspace  [i2(I>)],  while  H- 
reverses  the  largest  [^(5)-*-). 

2.6.1.  Other  choices  for  F.  W'ere  we  to  take  any  other  orthogonal- 
symmetric  factorization  {QiMi  with  Mi  indefinite)  of  Pi,  we  could  still  construct 
block  reflectors  H±  by  (14)  and  (16).  Now,  in  general,  neither  H{S)  =  nor 
H{D)  =  H~.  But  nevertheless,  H±P  =  tQi- 

2.6.2.  Other  representations  for  //.  When  m  =  2r  we  may  write 

=  T {«■  e  «.)  ]  («>  »««)'• 

in  exact  analogy  to  the  elementary  case.  In  general,  m  ^  2r  and  we  have  that 

^  (o  I- QiQ[ Qi)  [sUle  - c^?©]  ® • 

S.  Applications  and  computation. 

3.1.  Applications. 

1.  Optimal  error  bounds.  Let  the  coltunns  of  U  be  approximate  eigenvectors  for 
some  symmetric  A  €  Let  A  =  diag  (9i,  ■ " , 9i»)  be  approximate  eigenvalues. 

Let  X  :=  AU  -  UA  be  a  residual  matrix.  Next  map  X  into  its  mirror  image 
by  a  suitable  block  reflector  H.  Then  form  the  auxiliary  symmetric  matrix 

r  =  r(v')  =  (*  y') 

where  V  is  at  our  disposal. 

By  choosing  suitable  V  and  computing  the  eigenvalues  of  T{V)  error  bounds 
may  be  obtained  on  the  ^proximate  values  **’,#».  In  several  important  cases 
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V  can  be  chosen  so  that  the  bounds  are  optimal  for  the  given  information.  See 
jll,  §§10-4  —  10-9]  for  more  details. 

The  point  of  interest  here  is  that  the  residual  matrix  X  is  likely  to  have  lower 
rank  than  is  revealed  by  its  columns  alone. 

2.  Block  Hessenbcrg  form.  It  is  possible  to  reduce  a  matrix  B  6  to  block 

upper  Hessenberg  form  by  explicit  orthogonal  similarity  transformations 


<^11 

Ci2 

Ci3 

Gu 

Cis' 

B^C  =  H‘BH  = 

C21 

C22 

C23 

C24 

C25 

0 

C32 

Czz 

C34 

C35 

0 

0 

C43 

G44 

C45 

0 

0 

0 

Cu 

C55. 

Here  H  represents  a  product  of  three  block  reflectors,  H  -  The  first  step 

is  typical.  We  seek  Hi  so  that 


B21 

C21 

Bzi 

0 

B41 

0 

.Psi. 

0 

In  tnese  circumstances  we  expect  full  rank  to  be  maintained.  It  may  not  pay  to  try 
and  represent  Hi  =  Im-  GiG\  where  Gi  €  R”’*’'  since  usually  r  =  n. 

Block  QR  factorizations  can  be  computed  in  a  similar  manner,  by  applying  a 
sequence  of  block  reflectors  to  a  matrix  [2,4]. 

3.3.  Stable  computation  of  the  block  reflector.  Recall  that  E  €  R*"^* 
b  given  and  we  seek  a  block  reflector  H  =  H(Z)  such  that  HE  =  7  —  (q)  for  some 
nx  n  matrix  F.  In  thb  section  we  shall  describe  four  elegant  and  stable  constructions 
for  mirror  images  F  of  E  and  of  matrices  G  €  R'"’*'  such  that  G*G  s  2/r  and  the 

block  reflector  H  =  I  ~  GG*  maps  between  R{E)  and  ^  ([o])’  these. 

Algorithm  2,  appears  in  a  slightly  diflbrent  form  in  [4]. 

As  in  §2.6,  we  suppose  we  have  a  matrix  P  €  R"*^^  such  that  R(E)  C  R(P) 
and  P  has  orthonorma!  columns.  Thus  P*P  =  /r,  and  E  =  PT  for  some  T  €  R'’**. 
We  can  easily  find  T  since 

T  =  P*E. 

Let  P  =  j  with  Pi  square.  Let  Pi  =  QiAfi  be  a  polar  decomposition  of 
Pi.  The  orthogonal  polar  factor  <?i  b  a  mirror  image  of  P.  With  thb  choice. 


(20) 


Pt 


Then  as  we  have  seen,  H  »  H{S)  satbfies  HP 


which  b  our  objective. 


IS 


It  remains  only  to  find  a  convenient  representation  for  H.  The  one  given  by 
(14)-(15)  is  a  possibility.  We  shall  compute  the  necessary  matrices  cos  9  and  sin  8 
using  (9).  In  the  computation  of  sin  0,  however,  it  is  possible  that  cancellation  of 
nearly  equal  elements  on  the  diagonal  can  spoil  the  formation  of  I  -  M\.  Following 
ill,  p.  91]  in  the  one-dimensional  case,  we  may  use  the  relation 

I -Ml  =  {I  +  Mi)-'{J  -  Mf) 

=  (/  +  «,)-*(/ 

=  (I  +  M,)-'(PiPl) 


to  construct  I  -  Mi  without  any  matrix  subtraction. 

This  leads  to  the  following  algorithm.  Recall  that  Mi  —  cos 26  and  M2  = 
sin  26. 


.\LGORITHM  1.  Compute  F  €  R*’'*  and  f  R'*’'’’,  r  <  n,  such  that 
H^E^F,  where  =  I  -  G^G\. 

1.  Find  P  €  R’"'‘'  and  T  €  R'^"  such  that  R{E)  C  R(P)  ,P‘P  =  /,  and 
PT  =  E; 

2.  \QuMi\ ;=  polar(Pi)  and  \Q2,M2\  :=  polarfPj),  where  P  =  ( 

4.  cose  ;=  (/+  MiY^^  and  sin 6  :»  ((/  r  Afj)"‘P2Pj)*'^^ 

5.  w.-[Qj3ineJ- 


In  Step  1,  the  required  orthonormal  matrix  P  may  be  obtmned  using  a  QR- 
factorization  of  E,  with  column  pivoting  if  we  wish  r  to  be  minimal  and  R{E)  = 
P(P). 


There  are  no  additional  difficulties  in  computing  G-  and  H.  according  to 
(16)-(17).  Only  Step  5  of  Algorithm  1  needs  to  be  replaced,  by 
Qi  sin  6 


I  * 


5a.  [_g,coB6, 

We  give  the  resulting  algorithm  the  name  “Algorithm  la” 


The  two  polar  decompostions  and  two  square  roots  add  to  the  cost  of  Algorithm 
1.  It  may  well  be  that  the  cost  of  applying  H  to  some  large  matrix  so  dominates  the 
cost  of  contructing  it  that  this  is  insignificant.  We  shall  try,  nevertheless,  to  reduce 
this  initial  cost  by  seeking  alternatives  to  the  representation  given  by  (14)-(l'5j.  'We 
therefore  seek  a  new  representation  of  as  /  -  where  By 

(20),  or  equally  well  by  (9)  and  (13), 


(ji) 


15*5  -  1  +  i''i»  +  20i^l] 

—  /  +  Ml  . 

.  ✓ 
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Now  let  R  be  the  upper  triangular  Cholesky  factor  of  /  +  Mi.  Since  the 
eigenvalues  of  /  +  M\  are  ail  in  [I,  2',  both  R  and  I  ■+■  Mi  are  extremely  well 
conditioned.  Define 

QiR^ 

PzR  \  ■ 

Then 


(22) 


Qi(i  +  A/,) 
Pi 


=  SR~'  . 


Thus,  by  (21)  and  (22), 


//(S)  =  /-S(|S‘5)“*5‘ 
=  /  -  SR-'R-*S^ 
=  I  -  . 


Thus  we  have  a  second  algorithm. 

ALGORITHM  2.  Compute  F  €  R"’**  and  €  R"*’*’’,  r  <  n,  such  that 
HE  =  7,  where  H  =  I-  GV(G'+)‘. 


1.  Find  P  €  R"*'  and  T  €  R’’’*"  such  that  R(E)  C  R(P) ,P*P  =  /,  and 
PT  =  E; 

2.  :=  polar(Pi)  where  P  =  (J‘); 

4.  R  :=  cholesky (/  + Ml); 


Thus  we  may  represent  H.,.  without  having  to  construct  sin  6  or  cos  6.  It  a 
worthwhile  asking  whether  we  can  do  the  same  for  H~.  We  seek  Gi.  such  that 
J7.  SB  /  —  G'_(G*_y.  We  may  attempt  to  repeat  the  derivation  above,  substituting 
~<7i  for  (?i.  But  there  is  a  cause  for  concern.  The  Cholesky  factor  of  /  -  Mi  and 
its  inverse  may  not  exist  or,  worse,  may  be  ill  conditioned. 

Instead  we  proceed  as  follows.  Define  P-  s  where  G-  is  given  by  (17). 
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Then  H.  =  I  -  2P_ 


'  -  2P-.  Furthermore,  using  (9), 
op  -of  Qisine  \(  gisine  V 

~^\-Q2Cose)  [-QiCoseJ 

_/  gisin2e  )foros^Q)-if  Qjsjn2©  V 

-  V-g2{2cos2e)  j  V-Q2(2cos2©)  j 

^fQiMiP-^)  fQiMzR-^y 

V  -Q2i2‘  J[  -QiR^  I 

To  summarize,  we  have 

ALGORITHM  3.  Compute  F  €  R***  and  G'_  6  R”*"*',  r  <  n,  such  that 
HE  =  F,  tt»A«re  H=I-  G'_(GL)'. 

1.  Find  F  C  R’"’‘"  and  T  €  R'^"  such  that  R{E)  C  R(P)  ,F‘P  =  /  ,  and 
PT  =  £; 

2.  !gi,A/i’  polar(Pi)  and  [g2»M2l  Polar(P2)  where  P  =  (p*); 

i.  P  :=  cholesky(/ +  A/i); 

5  O'  •=[«•"»*■' 1 

-  ■  I  -«»/?*  J  • 

Therefore  we  may  again  stably  construct  a  representation  of  without  resort 
to  sin  ©  and  cos  ©.  But  in  this  case  we  must  compute  a  polar  decomposition  (7)  of 
Pj. 

Thus,  the  essential  difference  between  the  constructions  of  G±  and  is  that 
for  G^  one  uses  two  matrix  square  roots,  to  compute  cos©  =  and 

sin©  =  ^  [(^  +  ~  while  for  G^  one  must  use  a  Cholesky  factor- 

ization  of  /  +  Afi.  Furthermore,  one  may  construct  G+  without  ever  computing  the 
factors  Q2  and  M2  of  P2,  which  is  a  distinct  advantage. 

Still  other  representations  for  H±  are  possible.  For  example, 

1.  Compute  Pi  =  QiMi  and  V  =  (/  + Afi)"*; 

2  a  -QtPi  1 

*■  1-PtVPij- 

3.3.1.  Efficiency.  Algorithms  that  are  couched  in  terms  of  modules  such  as 
the  Basic  Linear  Algebrta  Subprograms  [9],  matrix-vector  products,  and  matrix- 
matrix  products  tend  to  perform  well  on  modem  vector  and  parallel  computers 
(5].  In  fact,  very  high  speed  systolic  array  devices  can  be  used  to  implement  these 
operations.  One  advantage  of  block  reflectors  b  that  they  can  be  computed  using 
matrix  multiplication  for  most  of  the  work,  and  they  can  be  applied  using  matrix 


I 

j 

multiplication  for  all  of  the  work.  Bischof  and  Van  Loan  |l|  point  out  that  algorithms 
“rich  in  matrix  multiplication”  are  attractive  for  these  reasons.  Matrix  multiply 
.  (n  X  n)  also  has  the  extremely  important  property  that  there  is  substantial  reuse 

of  data  -  O(n^)  data  and  O(n^)  arithmetic.  It  is  therefore  possible  to  support  a 
processor  whose  speed  is  0(n)  times  greater  than  the  bandwidth  of  the  memory. 

In  Algorithm  2  above,  computation  of  a  block  reflector  requires 
[i]  Computation  of  an  orthonormal  matrix  P  such  that  E  =  FT; 

|iij  Polar  decomposition  of  an  n  x  n  matrix; 

|iii]  Cholesky  factorization  of  an  n  x  n  matrix  and  inversion  of  the  Cholesky  factor; 
[iv]  Matrbc  multiplication. 

As  applied  to  computation  of  the  block  reflector,  the  operation  counts  of  items 
[i]  and  [iv]  are  O(mn^)  and  those  of  items  [uj  and  [iiij  aire  O(n^).  We  are  especially 
interested  in  the  case  m  n. 

The  computation  of  P  (item  (ij)  could  be  done  using  a  QR  factorization  (with 
column  pivoting  if  we  wish  to  make  the  number  of  columns  of  P  as  small  as  possible). 
The  implementation  suggested  by  Bischof  and  Van  Loan,  which  is  rich  in  matrix 
multiply,  could  be  used.  In  a  later  paper,  we  shall  give  another  algorithm  for  item 
[ij  that  is  rich  in  matrix  multiply. 

Item  (iiij  is  not  matrix  multiply,  but  it  is  very  inexpensive  compared  to  the 
other  items. 

The  polar  decomposition,  item  [ii;,  can  also  be  computed  with  a  procedure 
dominated  by  matrix  multiply.  We  start  with  Higham's  method  for  the  polar  de¬ 
composition  of  a  given  nonsingular  matrix  A.  In  brief,  this  algorithm  constructs  a 
sequence  of  matrices  {Bi}  where 

Bo  =  A 

and 

BiM  =  \biBi  +  i-Br’') 

*  1% 

and  the  scalars  qr*  are  chosen  by  the  algorithm  to  accelerate  convergence.  The 
sequence  {Ft}  converges  quadratkally  to  the  orthogonal  polar  factor.  Higham  has 
shown  that  5-6  iterations  are  typically  needed  and  that  the  computation  time  is 
somewhat  less  than  that  for  the  usual  SVD-based  method  [7j. 

At  each  step,  Ff  ^  is  needed;  its  computation  dominates,  only  O(n^)  other 
work  is  done.  For  the  first  step,  the  inverse  can  be  computed  in  a  conventional 
way.  For  all  the  subsequent  iterations  of  Higham’s  method,  we  take  advantage  of 
the  fact  that  FJl\  is  a  good  a  priori  approximation  to  Ff  which  gets  better  with 
increasing  t  due  to  the  rapid  convergence  of  {F,}.  In  fact. 


-  Bill  =  0(2-»‘) . 
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It  follows,  since  Boo  =  Q  is  orthogonal  and  hence  has  condition  number  unity  that 


-  B;  ‘II  =  0(2-»')  . 

Therefore  we  use  Schulz's  iterative  method  }  12]  as  an  inner  iteration  to  compute 
B/^  This  matrix  iteration  produces  a  sequence  via 

■d*+i  =  Ak  +  (I  -  AkBt)Ak.  k  - 


where 

Ao  =  B~J^  . 

The  sequence  {Ak}  converges  to  B~^  quadratically. 

Note  that  this  method  is  entirely  matrix  multipiy-add.  Experiments  have 
shown  that  five  iterations  suffice  for  convergence  of  Higham’s  method.  The  question 
is,  how  many  iterations  of  Schulz's  method  are  required.  This  depends  on  i.  Here 
are  typical  results: 


Iteration  i 
1 
2 

3 

4 

5 


Mambf  r  q£  inngLiterMiQns-k 
6 
5 
3 
2 
1 


Thus,  about  17  Schulz  iterations,  or  34  matrix  multiplications,  are  needed  for 
the  polar  factorization.  Were  the  matrix  inverses  to  be  computed  directly,  five  such 
inverses  would  have  been  necessary.  Since  matrix  inversion  requires  as  many  floating 
point  operations  as  matrix  multiplkalion  (2n^),  the  matrix  multiply  oriented  version 
of  the  algorithm  is  more  efficient  if  matrix  multiply  can  be  done  at  a  rate  34/5  = 
6.8  times  faster  than  matrix  inversion. 

The  SVD  can  also  be  used  to  compute  a  polar  factorization.  Higham  has 
shown  that  hb  algorithm  b  somewhat  less  costly,  even  under  the  usual  model  of 
computational  cost. 

4.  Experiments.  Ten  very  ill-conditioned  12  x  8  matrices  E  were  generated 
by  the  following  procedure.  Random  matrices  U\  and  Vi  were  chosen  and  their 
columns  orthogonalized  to  produce  orthogonal  matrices  U  and  V.  Seven  random 
singular  values  were  obtained  by  sampling  the  random  variable 

c  :=  c*~^ 
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where  e  =  2  ..was  the  machine  precision  and  ft  was  uniformly  distributed  in  [0, 
1];  the  other  singular  value  was  taken  to  be  1.  Finally  we  computed 

E  =  UHV* , 


where  E  =  diag(oi, •  •  •  ,a8)>  We  then  computed  a  12  x  r  matrix  P  with  orthonormal 
columns,  and  rank(4&)  <  r  <  8.  Algorithm  2  was  used  to  find  F  and  All  com¬ 
putations  were  performed  on  an  IBM  PC/AT  using  PC  MATLAB,  which  employs 
IEEE-standard  double  precision  arithmetic,  with  15  decimal-digit  precision. 

Let 


Hi 

Hi 


Hi  e  R 


8x12 


In  each  case  we  computed  four  measures  of  error. 

1.  (nonorthogonality):  Ij/jj  -  H^H\\; 

2.  (anisometry):  \\F*F  -  E^E\\; 

3.  (correctness  of  F):  \\HiE  -  Fl|; 

4.  (correctness  of  H):  \\HiE\\. 

The  matrix  2-norm  was  used.  The  rank  of  P  varied  from  3  to  7.  The  condition 
number  of  E  was  always  at  least  2.7  x  10*®.  All  the  error  measures  were  in  the 
interval  10,.7  x  10'*^). 

Isomorphic  experiments  were  performed  using  (71  (computed  using  Algorithm 
3),  (computed  using  Algorithm  1),  and  G-  (computed  using  Algorithm  la), 
with  these  results: 

0.  Errors  using  were  <  .7  x  10“*^; 

1.  Errors  using  GL  were  <  .7  x  10“*^; 

2.  Errors  using  G+  were  <  .9  x  10“*^; 

3.  Errors  using  G_  were  <  3.0  x  10“*^, 
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